-
Notifications
You must be signed in to change notification settings - Fork 233
fix: optional clear cache between microbatch iterations #1074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Yubo Gao <[email protected]>
ℹ️ File Consistency CheckCheck based on commit: 9a9f638 (PR #1074 from ✅ DTensor Policy Worker Synchronization CheckBoth DTensor policy worker files were modified in this PR:
Please ensure that the changes are consistent between both files where applicable. This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
Signed-off-by: Yubo Gao <[email protected]>
ℹ️ File Consistency CheckCheck based on commit: 852e253 (PR #1074 from ✅ DTensor Policy Worker Synchronization CheckBoth DTensor policy worker files were modified in this PR:
Please ensure that the changes are consistent between both files where applicable. This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
1 similar comment
ℹ️ File Consistency CheckCheck based on commit: 852e253 (PR #1074 from ✅ DTensor Policy Worker Synchronization CheckBoth DTensor policy worker files were modified in this PR:
Please ensure that the changes are consistent between both files where applicable. This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
Signed-off-by: Yubo Gao <[email protected]>
ℹ️ File Consistency CheckCheck based on commit: cf551e5 (PR #1074 from ✅ DTensor Policy Worker Synchronization CheckBoth DTensor policy worker files were modified in this PR:
Please ensure that the changes are consistent between both files where applicable. This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
ℹ️ File Consistency CheckCheck based on commit: cf551e5 (PR #1074 from ✅ DTensor Policy Worker Synchronization CheckBoth DTensor policy worker files were modified in this PR:
Please ensure that the changes are consistent between both files where applicable. This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
ℹ️ File Consistency CheckCheck based on commit: e931727 (PR #1074 from ✅ DTensor Policy Worker Synchronization CheckBoth DTensor policy worker files were modified in this PR:
Please ensure that the changes are consistent between both files where applicable. This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
ℹ️ File Consistency CheckCheck based on commit: 76c0a43 (PR #1074 from ✅ DTensor Policy Worker Synchronization CheckBoth DTensor policy worker files were modified in this PR:
Please ensure that the changes are consistent between both files where applicable. This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning. |
…1074) Signed-off-by: Yubo Gao <[email protected]>
…1074) Signed-off-by: Yubo Gao <[email protected]>
What does this PR do ?
Help mitigate the performance overheads from clearing cache which is introduced in #926 by making clear cache behaviour optional.
Issues
This PR resolves #1036.
Usage
This PR introduces an additional option for
dtensor_config. To clear cache between microbatch iterations, add:A warning will be displayed before a training iteration indicating that clear cache has been turned on and its potential performance overheads.
Below is the speed comparison of the default SFT experiment with clear cache every 1 microbatch vs. no clear cache:

Before your PR is "Ready for review"
Pre checks:
Additional Information